大型並行處理器程式設計：實務導向方法：突破順序運算的瓶頸

「免費午餐」的終結

數十年來，開發者享受著「順序運算天花板」——一個以 丹納德縮放定律 為每一代新晶片帶來更快時鐘速度的時代。但我們已觸及 功耗牆。效能不再取決於頻率；而是取決於 併發性。要向前推進，我們必須運用 計算思維 來彌合抽象 數值方法 與現代 平行執行模型之間的差距。

精確度與效能的張力

將一個 領域問題 （例如分子動力學）從 多核心主機 轉移到 CUDA 裝置 不僅是語法上的改變；更是一種 問題分解的轉變。當我們進行平行化時，經常會改變運算順序。由於浮點數運算不具結合性，我們面臨一種權衡： 浮點數精確度與準確度。平行運算結果可能在數學上正確，但在數值上與其順序版本產生偏差。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary reason the 'Sequential Ceiling' was reached?

The end of Moore's Law entirely.

Thermal limits and the Power Wall hindering frequency scaling.

Lack of developer interest in C++.

The transition to quantum computing.

QUESTION 2

According to Amdahl's Law, if 5% of a program is strictly sequential, what is the maximum theoretical speedup?

Infinite speedup.

Approximately 20x.

5x.

100x.

QUESTION 3

Why might a parallel Molecular Dynamics simulation yield slightly different results than a sequential one?

The CPU uses 64-bit while the GPU only uses 8-bit.

Floating-point addition is non-associative in parallel execution.

Parallel threads randomly skip calculations.

The CUDA compiler ignores numerical methods.

QUESTION 4

What does 'Problem Decomposition' involve in the context of parallel programming?

Breaking code into functions for readability.

Mapping domain-specific data to parallel execution models like threads or grids.

Deleting unnecessary variables to save memory.

Compiling the code for multiple OS targets.

QUESTION 5

Which of the following describes the 'Computational Thinking' bridge?

A hardware component between the CPU and GPU.

A framework to translate domain knowledge into architecture-aware algorithms.

An automated AI tool that writes CUDA kernels.

The process of upgrading RAM on a host machine.